(Partially abridged from R geodata workshop 2019 & USC POIR Workshop on Maps in R, 2017)
We usually work with two-dimensional vector data, i.e., points located within a coordinate reference system (CRS).
For example, the point (-69.94°, 42.6°) is represented in the WGS84 CRS (the world geodetic system used by GPS).
Points describe location on a sphere (well, actually WGS84 models Earth as an ellipsoid) and its components are given in degrees:
Points can be connected to form other geometric shapes such as lines or polygons. Points and shapes can represent many geographic entities: cities, buildings, countries, rivers, lakes, etc.
We need third-party packages to extend R in order to work with geographical data (or geodata). Following are some examples:
maps - contains geo-data for national borders and
administrative regions for several countriesrgeos, maptools - Misc. functions for
operations on geometriessf (“Simple Features for R”): reading, writing and
transforming spatial datarnaturalearth - allows interaction with Natural Earth map data
(more on this later).install.packages(c("maps", "sf"))
As a first example, we will create a simple map of the continental
United States. We get the data from the maps package and
plot it using ggplot2.
states_map <- map_data("state")
Let us look at the structure of the data we retrieved from
maps. The data is stored as a data frame and contains
observations that are characterized by unique combinations of longitude
and latitude values. Each observation has the following attributes:
group, order, region, and subregion if applicable.
head(states_map)
## long lat group order region subregion
## 1 -87.46201 30.38968 1 1 alabama <NA>
## 2 -87.48493 30.37249 1 2 alabama <NA>
## 3 -87.52503 30.37249 1 3 alabama <NA>
## 4 -87.53076 30.33239 1 4 alabama <NA>
## 5 -87.57087 30.32665 1 5 alabama <NA>
## 6 -87.58806 30.32665 1 6 alabama <NA>
The longitude and latitude information denotes points along the borders of countries and geographical regions. We can represent them as points in the x-y-coordinate system, plotting the longitude along the x-axis and latitude along the y-axis.
ggplot(states_map, aes(x = long, y = lat)) + geom_point()
Not really what we are aiming for, but it’s a start.
The group and order variables in the data
set code relational information of the points. For example, there are 49
regions (all states minus Alaska and Hawaii, plus District of Columbia),
63 groups, and a varying number of points (observations) within each
group. The observations denote the border points we plotted previously,
and the order counter establishes the sequence in which they should be
plotted.
head(table(states_map$region))
##
## alabama arizona arkansas california colorado connecticut
## 202 149 312 516 79 91
We can use a geom_polygon() layer to plot the
observations as polygons, rather than points. In order to do that, we
need to specify the grouping parameter. If the data wasn’t ordered
correctly, we could use the dplyr::arrange() function to
establish the correct sequencing of points for each polygon.
ggplot(states_map, aes(x = long, y = lat, group = group)) + geom_polygon()
The map appears to be a bit “squished”. This is because the scaling
of the x-axis and y-axis is not based on the scaling of longitude and
latitude. We can pass special mapping parameters to ggplot2
via the coord_map() command to achieve the correct aspect
ratio or use different map projections. Note that the
mapproj library is required to use this feature.
ggplot(states_map, aes(x = long, y = lat, group = group)) + geom_polygon() +
theme_minimal() + coord_map() # default is Mercator projection
ggplot(states_map, aes(x = long, y = lat, group = group)) + geom_polygon() +
theme_minimal() + coord_map("mollweide")
So far we explored a simple way to read and plot maps. As we will see in the next sections, more advanced and flexible methods are available.
Most packages for working with geo-data in R rely on the Simple Features standard, which describes how objects in the real world can be represented in computers in terms of their spatial geometry. A feature has a geometry describing where the feature is located on Earth, plus a set of attributes describing other properties.
For example:
Or also:
A spatial dataset contains geodata in a
geometry column and further attributes, like the ones we
just saw.
Now, we make a world map. First, we load the required packages:
library(ggplot2)
library(maps)
library(sf)
We can use map() to load the world data; then we convert
it to a “simple features” object with st_as_sf():
worldmap_data <- st_as_sf(map("world", plot = FALSE, fill = TRUE))
head(worldmap_data, n = 3) # to have a look inside
## Simple feature collection with 3 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -70.06612 ymin: -18.01973 xmax: 74.89131 ymax: 38.4564
## Geodetic CRS: WGS 84
## ID geom
## 1 Aruba MULTIPOLYGON (((-69.89912 1...
## 2 Afghanistan MULTIPOLYGON (((74.89131 37...
## 3 Angola MULTIPOLYGON (((23.9665 -10...
We recognize this as a spatial dataset with a header giving some
general info (geometry type, dimension, etc.), and two columns:
ID (attribute) and geom (spatial data).
What are the numbers in the geom column?
We make a basic plot of the world map by adding a
geom_sf layer to a ggplot() call:
ggplot() + geom_sf(data = worldmap_data)
Suppose you have a separate dataframe with the coordinates of a few landmarks you want to add to the map.
## # A tibble: 3 × 3
## name long lat
## <chr> <dbl> <dbl>
## 1 Berlin 13.4 52.5
## 2 New York -73.9 40.6
## 3 Sydney 151. -33.9
We can add these by simply adding a geom_point
layer:
ggplot(some_cities) + geom_sf(data = worldmap_data) + geom_point(aes(x = long,
y = lat), color = "red")
Of course we can also add labels with geom_label (which
draws a box around the text) or geom_text (which does not).
While we are at it, we also get rid of the quite uninformative axis text
and labels:
ggplot(some_cities) + geom_sf(data = worldmap_data) + geom_point(aes(x = long,
y = lat), color = "red") + geom_label(aes(x = long, y = lat,
label = name), hjust = 0, vjust = 1, nudge_x = 3) + theme(axis.title = element_blank(),
axis.text = element_blank())
Another improvement would be to turn off the plot grid.
Different names may refer to the same country (e.g. Germany / Federal Republic of Germany / BRD; Italy / Italian Republic / Republic of Italy), therefore they are often not a good identifier.
ISO 3166-1 designates every country a two- or three-letter code (e.g. DE / DEU; IT / ITA) and it is often used in datasets.
Let’s create a map of Europe highlighting a random sample of EU countries together with their ISO codes: we’ll use Natural Earth as map provider.
library(rnaturalearth)
library(ggrepel)
# get Europe map from Natural Earth
world <- ne_countries(type = "map_units", returnclass = "sf",
continent = "europe")
# set coordinate system to EPSG 3035 (European LAEA
# projection) (more info: https://epsg.io/3035)
world <- st_transform(world, 3035)
world <- st_crop(world, xmin = 0, xmax = 7e+06, ymin = 0, ymax = 5e+06)
labelsdf <- data.frame(label = sprintf("%s (%s)\nISO Codes: %s / %s",
world$name, world$formal_en, world$iso_a2, world$iso_a3),
stringsAsFactors = FALSE)
labels_coords <- as.data.frame(st_coordinates(st_centroid(world$geometry)))
labelsdf$x <- labels_coords$X
labelsdf$y <- labels_coords$Y
set.seed(9)
world$show_label <- sample(c(TRUE, FALSE), nrow(labelsdf), replace = TRUE,
prob = c(0.15, 0.85)) & !is.na(world$iso_a2)
labelsdf <- labelsdf[world$show_label, ]
ggplot() + geom_sf(aes(fill = show_label), data = world) + geom_label_repel(aes(label = label,
x = x, y = y), data = labelsdf, size = 3, min.segment.length = 0,
alpha = 0.85) + scale_fill_brewer(guide = "none", palette = "Dark2") +
labs(title = "Random sample of European countries w/ ISO codes",
caption = "source: Natural Earth Data") + theme_bw() +
theme(axis.text = element_blank(), axis.title = element_blank(),
axis.ticks = element_blank())
The Nomenclature of Territorial Units for Statistics (NUTS: Nomenclature des Unités Territoriales Statistiques) is a geocode standard developed by Eurostat that divides the EU territory into regions at 3 different levels for socio-economic analyses of the regions.
Candidate countries and countries inside EFTA also have NUTS regions (Norway, Switzerland, Albania, Turkey, etc.).
Italy has 3 levels of NUTS:
Below the NUTS levels, Italy has two levels of local administrative units: level 1 corresponds to NUTS 3, and level 2 includes the 8094 Municipalities.
In the following, we plot the map of Italy at the NUTS level 2, annotating a number of randomly chosen regions.
# read NUTS level-2 info from Eurostat
nutsrg <- read_sf("https://raw.githubusercontent.com/eurostat/Nuts2json/master/pub/v2/2021/3857/20M/nutsrg_2.json")
# [optional] set coordinate system to EPSG 3857 (Web
# Mercator)
st_crs(nutsrg) <- 3857
# restrict to Italy NUTS codes
nutsrg <- nutsrg[grepl("^IT", nutsrg$id), ]
labelsdf <- data.frame(label = sprintf("%s: %s", nutsrg$id, nutsrg$na),
stringsAsFactors = FALSE)
labels_coords <- as.data.frame(st_coordinates(st_centroid(nutsrg$geometry)))
labelsdf <- bind_cols(labelsdf, labels_coords)
nutsrg$show_label <- rep(FALSE, nrow(nutsrg))
set.seed(10)
nutsrg$show_label[sample(1:nrow(nutsrg), 4)] <- TRUE
labelsdf <- labelsdf[nutsrg$show_label, ]
ggplot() + geom_sf(aes(fill = show_label), data = nutsrg) + geom_label_repel(aes(label = label,
x = X, y = Y), data = labelsdf, size = 3, min.segment.length = 0,
alpha = 0.85) + scale_fill_brewer(guide = "none", palette = "Dark2") +
labs(title = "Random sample of Italy NUTS level 2", caption = "source: Eurostat / https://github.com/eurostat/Nuts2json") +
theme_bw() + theme(axis.text = element_blank(), axis.title = element_blank(),
axis.ticks = element_blank())
A number of R libraries come already with geodata, or provide ways to download them programmatically. For example:
naturalearthdata.com: *Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110m scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software.
Provides vector data for:
You can either download the data directly from the website or (even better) use the package rnaturalearth.
OSM Admin Boundaries Map is a web-service to download administrative boundaries worldwide for different levels in different formats (shapefile, GeoJSON, etc.). It also contains meta-data depending on the country.
There are many file formats that store data for geographic information system (GIS) software. The most popular include:
.shp).geojson, .json).kml, .kmz),
used by Google Earth.gml)Additional formats exist, such as vector data files
(.svg, .poly) or databases (.sql,
.sqlite) - although they are not specific for geodata, they
can be used for this purpose.
The good news is that the sf library can handle all
popular file formats!
dplyr: recapSuppose you have two datasets: pm, a table with
particulate matter measurements in some cities; and
city_coords, a table with GPS coordinates of many
cities.
| city | pm_mg |
|---|---|
| Amman | 999 |
| Saltillo | 869 |
| Usak | 814 |
| The Bronx | 284 |
| Seoul | 129 |
| city | lng | lat |
|---|---|---|
| Amman | 31.910 | 35.948 |
| Saltillo | -100.994 | 25.434 |
| Berlin | 13.402 | 52.520 |
| Usak | 29.403 | 38.673 |
| New York | -73.940 | 40.600 |
| Seoul | 126.984 | 37.536 |
To combine these datasets (for example, to create a choropleth map),
we can use the city column for matching. We note that not
all cities in pm also appear in city_coords,
and vice versa.
We recall the dplyr’s three different ways of joining
tables:
left_join(a, b, by=column) always retains rows
on the left side and fills up non-matching rows with
NAs.knitr::kable(left_join(pm, city_coords, by = "city"))
| city | pm_mg | lng | lat |
|---|---|---|---|
| Amman | 999 | 31.910 | 35.948 |
| Saltillo | 869 | -100.994 | 25.434 |
| Usak | 814 | 29.403 | 38.673 |
| The Bronx | 284 | NA | NA |
| Seoul | 129 | 126.984 | 37.536 |
right_join(a, b, by=column)` always retains rows
on the right side and fills up non-matching rows with
NAs.knitr::kable(right_join(pm, city_coords, by = "city"))
| city | pm_mg | lng | lat |
|---|---|---|---|
| Amman | 999 | 31.910 | 35.948 |
| Saltillo | 869 | -100.994 | 25.434 |
| Usak | 814 | 29.403 | 38.673 |
| Seoul | 129 | 126.984 | 37.536 |
| Berlin | NA | 13.402 | 52.520 |
| New York | NA | -73.940 | 40.600 |
right_join(a, b, by=column)` only retains rows that
match on both sides.knitr::kable(inner_join(pm, city_coords, by = "city"))
| city | pm_mg | lng | lat |
|---|---|---|---|
| Amman | 999 | 31.910 | 35.948 |
| Saltillo | 869 | -100.994 | 25.434 |
| Usak | 814 | 29.403 | 38.673 |
| Seoul | 129 | 126.984 | 37.536 |
In choropleth maps, regions are shaded according to the values of some variable, e.g.:
knitr::include_graphics("https://www.un.org/waterforlifedecade/images/scarcity/2013_scarcity_graph_2.png")
By the way, the above picture is a nice example of bad color scheme (why?).
We now want to create a choropleth map using data from the Berlin Senate Department for Urban Development and Housing (through FIS Broker). The data include indicators for monitoring of social development.
# get the geoJSON
bln_soc <- read_sf("https://github.com/WZBSocialScienceCenter/r-geodata-workshop/raw/master/data/bln_plr_sozind.geojson")
head(bln_soc)
## Simple feature collection with 6 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 386668.2 ymin: 5817761 xmax: 390764.3 ymax: 5820432
## Projected CRS: ETRS89 / UTM zone 33N
## # A tibble: 6 × 11
## PLANNAME EW2016 STATUS1 STATUS2 STATUS3 STATUS4 DYNAMO1 DYNAMO2 DYNAMO3
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Stülerstraße 3114 4.98 1.52 8.16 20.5 -0.19 0.14 -0.42
## 2 Großer Tiergar… 192 NA NA NA NA NA NA NA
## 3 Lützowstraße 5518 6.83 2.08 17.7 34.3 -0.5 -0.03 -0.8
## 4 Körnerstraße 4704 6.27 1.49 20.7 45.9 -1.33 -0.76 -1.66
## 5 Nördlicher Lan… 1135 2.06 0 1.5 3.55 -0.77 -0.74 -0.42
## 6 Wilhelmstraße 2190 3.64 0.89 8.45 27.3 -1.23 -0.59 0.13
## # … with 2 more variables: DYNAMO4 <dbl>, geometry <MULTIPOLYGON [m]>
Our aim is to visualize the spatial distribution of
STATUS4, representing child poverty rate in percent (i.e.,
the portion of children \(<15\) y.o.
living in a household that obtains social support).
head(bln_soc[c("PLANNAME", "STATUS4", "geometry")], 5)
## Simple feature collection with 5 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 386668.2 ymin: 5817761 xmax: 389894.1 ymax: 5820432
## Projected CRS: ETRS89 / UTM zone 33N
## # A tibble: 5 × 3
## PLANNAME STATUS4 geometry
## <chr> <dbl> <MULTIPOLYGON [m]>
## 1 Stülerstraße 20.5 (((387256.6 5818552, 387258.6 5818545, 38726…
## 2 Großer Tiergarten NA (((386767.5 5819393, 386767.3 5819393, 38676…
## 3 Lützowstraße 34.3 (((387952.6 5818275, 387974.2 5818265, 38799…
## 4 Körnerstraße 45.9 (((388847.1 5817875, 388881.2 5817863, 38897…
## 5 Nördlicher Landwehrkanal 3.55 (((388129.5 5819015, 388142.8 5818978, 38815…
The first attempt is to directly plot the map with fill
aesthetics depending on STATUS4:
ggplot(bln_soc) + geom_sf(aes(fill = STATUS4))
But there’s a problem: STATUS4 is continuous, thus the
color scale is continuous too. For choropleth maps, discrete ranges are
better for discerning the colors.
We need to do some binning: we decide to set 5 discrete bins (equally spaced from 0 to 100%). In the revised plot, we also change the color palette and adjust the appearance:
# binning
bln_soc$child_pov_bins <- cut(bln_soc$STATUS4, seq(0, 100, by = 20))
# new plot
ggplot(bln_soc) +
geom_sf(aes(fill = child_pov_bins)) +
scale_fill_brewer(palette = 'OrRd', na.value = "grey90",
guide = guide_legend(title = 'Child pov.\nranges (%)')) +
coord_sf(datum = NA) + # disable lines ("graticule") in the background
theme_void()
Most of the time, you’ll have at least two datasets: one containing the measurement(s) you want to show, and the other containing the geodata.
In the following example, we focus on data related to People at risk of poverty or social exclusion by NUTS level-2 regions (tgs00107) from Eurostat.
First, we load the data:
pov_risk <- read_csv("https://github.com/WZBSocialScienceCenter/r-geodata-workshop/raw/master/data/tgs00107_pov_risk_nuts2.csv")
pov_risk$risk_pct_bins <- cut(pov_risk$risk_pct, seq(0, 100,
by = 10))
pov_risk_2016 <- filter(pov_risk, year == 2016) # 2016 has fewest NAs
head(pov_risk_2016)
## # A tibble: 6 × 4
## nuts year risk_pct risk_pct_bins
## <chr> <dbl> <dbl> <fct>
## 1 AT 2016 18 (10,20]
## 2 AT11 2016 15.1 (10,20]
## 3 AT12 2016 13 (10,20]
## 4 AT13 2016 26 (20,30]
## 5 AT21 2016 16.3 (10,20]
## 6 AT22 2016 18.4 (10,20]
Next, we load the GeoJSON containing the NUTS level-2 regions provided by Eurostat (the same we already used in one of the above examples):
# read NUTS level-2 info from Eurostat
nutsrg <- read_sf("https://raw.githubusercontent.com/eurostat/Nuts2json/master/pub/v2/2021/3857/20M/nutsrg_2.json")
head(nutsrg)
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 886184 ymin: 6159172 xmax: 2098645 ymax: 6623262
## Geodetic CRS: WGS 84
## # A tibble: 6 × 3
## id na geometry
## <chr> <chr> <MULTIPOLYGON [°]>
## 1 CZ05 Severovýchod (((1729426 6582279, 1758628 6576034, 1786629 6556127, 1…
## 2 CZ06 Jihovýchod (((1725026 6422248, 1825031 6373848, 1870634 6376971, 1…
## 3 CZ07 Střední Morava (((2048642 6342623, 2039842 6328571, 2022641 6319594, 2…
## 4 CZ08 Moravskoslezsko (((2007840 6457767, 2040242 6433957, 2067843 6430835, 2…
## 5 DE11 Stuttgart (((1122596 6367603, 1126196 6355503, 1125796 6340281, 1…
## 6 DE12 Karlsruhe (((1068993 6347697, 1051393 6336768, 1020191 6330523, 1…
We notice that both datasets contain a NUTS level-2 identifier, so it is easy to join them:
pov_risk_2016_geo <- left_join(nutsrg, pov_risk_2016, by = c(id = "nuts"))
head(pov_risk_2016_geo)
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 886184 ymin: 6159172 xmax: 2098645 ymax: 6623262
## Geodetic CRS: WGS 84
## # A tibble: 6 × 6
## id na geometry year risk_pct risk_pct_bins
## <chr> <chr> <MULTIPOLYGON [°]> <dbl> <dbl> <fct>
## 1 CZ05 Severovýchod (((1729426 6582279, 175862… 2016 11.7 (10,20]
## 2 CZ06 Jihovýchod (((1725026 6422248, 182503… 2016 11.5 (10,20]
## 3 CZ07 Střední Morava (((2048642 6342623, 203984… 2016 13.1 (10,20]
## 4 CZ08 Moravskoslezsko (((2007840 6457767, 204024… 2016 22.1 (20,30]
## 5 DE11 Stuttgart (((1122596 6367603, 112619… 2016 14.6 (10,20]
## 6 DE12 Karlsruhe (((1068993 6347697, 105139… 2016 16.2 (10,20]
Now we have everything we need to plot the data as a choropleth map!
ggplot(pov_risk_2016_geo) +
geom_sf(aes(fill = risk_pct_bins), size=0.1) +
scale_fill_brewer(palette="OrRd", na.value="grey90",
guide=guide_legend(title="Pct. of people at\nrisk of poverty")) +
labs(title="Poverty in the EU (2016)",
subtitle="Percentage of people at risk of poverty or social exclusion",
caption="src: Eurostat") +
coord_sf(datum=NA) + # disable lines ("graticule") in the background
theme_void()
The pov_risk dataset has a time dimension (column
year): so it is tempting to create spatio-temporal
visualizations to take advantage of this.
We have a couple of options:
facet_wrap);gganimate.To prepare data for plotting, we focus on 4 years only: 2010, 2012, 2014, and 2016. When dealing with spatio-temporal data, we have to consider that maps change over time! In fact, the NUTS regions are released by Eurostat for 2010, 2013, and 2016.
So there’s a little overhead here: we need to assign each year in
pov_risk dataframe the corresponding map year (i.e., “data
year” 2012 should be assigned the “map year” 2010 because it is the
closest match in the past). Then, we would need to fetch the GeoJSON
files for each “map year”.
# 0. select 2010, 2012, 2014, 2016
pov_risk_10_to_16 <- filter(pov_risk, year %in% seq(2010, 2016, by = 2))
# 1. assign each year in the poverty data the corresponding year of the map
pov_risk_10_to_16 <- mutate(pov_risk_10_to_16,
map_year = case_when(year < 2013 ~ 2010L,
year == 2014 ~ 2013L,
year > 2014 ~ 2016L))
# 2. load geo-data for each available year (2010, 2013 and 2016)
map_yrs <- unique(pov_risk_10_to_16$map_year)
nutsrg_per_yr <- list()
i <- 1
for (yr in map_yrs) {
nutsrg_per_yr[[i]] <- read_sf(glue::glue("https://raw.githubusercontent.com/eurostat/Nuts2json/master/pub/v2/{yr}/3857/20M/nutsrg_2.json"))
nutsrg_per_yr[[i]]$map_year <- yr
i <- i + 1
}
# generate a single data frame with each year's geo-data
nutsrg_all_yrs <- do.call(rbind, nutsrg_per_yr)
# join geo-data and poverty data also on year level
pov_risk_10_to_16_geo <- left_join(nutsrg_all_yrs, pov_risk_10_to_16, by = c('id' = 'nuts', 'map_year'))
# repeat missing geo-data for each year and set the respective year
yr10_na <- filter(pov_risk_10_to_16_geo, map_year == 2010L & is.na(year))
yr10_na$year <- 2010
pov_risk_10_to_16_geo <- rbind(pov_risk_10_to_16_geo, yr10_na)
yr10_na$year <- 2012
pov_risk_10_to_16_geo <- rbind(pov_risk_10_to_16_geo, yr10_na)
yr13_na <- filter(pov_risk_10_to_16_geo, map_year == 2013L & is.na(year))
yr13_na$year <- 2014
pov_risk_10_to_16_geo <- rbind(pov_risk_10_to_16_geo, yr13_na)
yr16_na <- filter(pov_risk_10_to_16_geo, map_year == 2016L & is.na(year))
yr16_na$year <- 2016
pov_risk_10_to_16_geo <- rbind(pov_risk_10_to_16_geo, yr16_na)
# drop the rows with the missing geo-data
pov_risk_10_to_16_geo <- filter(pov_risk_10_to_16_geo, !is.na(year))
# 3. make the map using facets
ggplot(pov_risk_10_to_16_geo) +
geom_sf(aes(fill = risk_pct_bins), size = 0.1) +
scale_fill_brewer(palette="OrRd", na.value="grey90",
guide = guide_legend(title = 'Pct. of people at\nrisk of poverty')) +
labs(title = 'Poverty in southern EU 2016 (NUTS level-2)',
subtitle = 'Percentage of people at risk of poverty or social exclusion',
caption = 'src: Eurostat') +
coord_sf(datum = NA, xlim = c(-12e5, 35e5), ylim = c(41e5, 68e5)) +
facet_wrap(~ year, nrow = 2) + # facets per year
theme_bw() + theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.position = 'bottom')
library(gganimate)
p <- ggplot(pov_risk_10_to_16_geo) + geom_sf(aes(fill = risk_pct_bins),
size = 0.1) + scale_fill_brewer(palette = "OrRd", na.value = "grey90",
guide = "none") + coord_sf(datum = NA, xlim = c(-1200000,
3500000), ylim = c(4100000, 6800000)) + theme_bw() + theme(axis.text = element_blank(),
axis.title = element_blank(), axis.ticks = element_blank(),
legend.position = "bottom", title = element_text(size = 7),
legend.text = element_text(size = 5)) + transition_manual(year) +
labs(title = "{current_frame} - Poverty in southern EU {current_frame} (NUTS level-2)",
subtitle = "Percentage of people at risk of poverty or social exclusion",
caption = "source: Eurostat")
animate(p, width = 700, height = 450, duration = 5, renderer = ffmpeg_renderer(format = "webm"))
While you can unroll a cylinder or a cone to a sheet without tearing or stretching, you are unable to do the same with a sphere or an ellipsoid: they are “non-developable surfaces”.
source: flickr.com/photos/waferboard
Map projections try to do the impossible: project points of a sphere to a plane. We already worked with WGS84 coordinates, which are locations on a sphere (ellipsoid) defined as degrees. The points from spherical/ellipsoidal coordinate system are converted to a “flat surface” cartesian coordinate system via a map projection.
All map projections distort: area, shape, direction, distance and other properties can be different than on the actual sphere.
Mercator projection:
Mollweide projection:
Goode homolosine projection:
Spatial datasets usually specify the CRS, so you should be able to
set or transform it (e.g., st_transform(map, 3035)). You
may want to use a different projection to have less distortion in the
area of your interest. Another important point that you should take into
account is the compatibility between datasets: for example, you may have
country shapes with a specific CRS, and geocoded points with a different
CRS.